Predicting unseen triphones with senones

نویسندگان

  • Mei-Yuh Hwang
  • Xuedong Huang
  • Fil Alleva
چکیده

In large-vocabulary speech recognition, the decoder often encounters triphones that are not covered in the training data. These unseen triphones are usually represented by corresponding diphones or context independent monophones. We propose to use decision-tree based senones to generate needed senonic baseforms for unseen triphones. A decision tree is built for each individual Markov state of each phone, and the leaves of the trees constitute the senone codebook. To find the senone a Markov state of any triphone is associated with, we traverse the corresponding tree until we reach a leaf node, where a senone is represented. We used the DARPA 5,000-word speaker-independent Wall Street Journal dictation task to evaluate the proposed method. The word error rate was reduced by 11% when unseen triphones were modeled by the decision-tree based senones. When there were at least 5 unseen triphones in each test utterance, the error rate could be reduced by more than 20%. This research was sponsored by the Defense Advanced Research Projects Agency, DOD and monitored by the Space and Naval Warfare System Command under contract N00039-91-C-0158, ARPA Order 7239. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Unseen Triphones with Senones - Speech and Audio Processing, IEEE Transactions on

In large-vocabulary speech recognition, we often encounter triphones that are not covered in the training data. These unseen triphones are usually backed off to their corresponding diphones or context-independent phones, which contain less context yet have plenty of training examples. In this paper, we propose to use decision-tree-based senones to generate needed senonic baseforms for these uns...

متن کامل

Creation of unseen triphones from seen triphones, diphones and phones

With limited training data, infrequent triphone models for speech recognition will not be observed in suficient number. In this report, a speech production approach is used to predict the characteristics of unseen triphones by using a transformation technique in the parametric representation of a formant speech synthesiser. Two techniques are currently tested. In one approach, unseen triphones ...

متن کامل

Creating unseen triphones by phone concatenation in the spectral, cepstral and formant domains

Abstruct A technique for predicting triphones by concatenation of diphones or monophones is studied. The models are connected using linear interpolation of parameter trajectories. Previous work on formant parameters is extended to filter channels and cepstrum coeffcients. Preliminary results indicate that the proposed technique works well also in these domains. In both cases, the approximation ...

متن کامل

Creation of unseen triphones from diphones and monophones using a speech production approach

With limited training data, infrequent triphone models for speech recognition will not be observed in sufficient number. In this report, a speech production approach is used to predict the characteristics of unseen triphones by concatenating diphones and/or monophones in the parametric representation of a formant speech synthesiser. The parameter trajectories are estimated by interpolation betw...

متن کامل

Training production parameters of context-dependent phones for speech recognition

A representation form of acoustic information in a trained phone library at the production parametric as well as the spectral level is described. The phones are trained in the parametric domain and are transformed to the spectral domain by means of a synthesis procedure. By this twofold description, potentially more powerful procedures for speaker adaptation and generation of unseen triphones c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Speech and Audio Processing

دوره 4  شماره 

صفحات  -

تاریخ انتشار 1996